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(54) Method and apparatus for transmitting a video image 



(57) A method of transmitting a video image Includ- tion of each image, tracking the object of interest by se- 
ingan object of interest comprises captLtring a sequence lecting and extracting a region of each image including 
of images in which the object of Interest occupies a frac- the object of interest, and coding only the selected re- 
gion of each captured Image. 
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Description 

[0001] The invention relates to a metiiod and appara- 
tus fortransmitting a video image. More particularly, tiie 
invention reiates to transmission of a video image in- 
cluding an object ot inlerest, such as a face in commu- 
nications using a mobile video-phone. 
[0002] In mobile video communication, often the vid- 
eo camera is hand-held and moves relative to the sub- 
ject. That is particularly the case, for example, in mobile 
phone based video communication, where the user has 
to direct a camera linked to the phone handset to point 
to his own face. This can cause the problem that, be- 
cause of head and hand movements, the outline of the 
user's face moves within the frame of tlie image cap- 
tured by the camera, possibly even moving outside the 
frame, One solution for preventing the outline of tine 
head from moving outside theframe is to adjust thefocal 
length of the camera so that the outline of the head oc- 
cupies a relatively small fraction of the frame. As a re- 
sult, the probability that the head stays within the frame 
of the image is increased. However, the resolution of the 
face image is decreased and so the quality of the video- 
lini< is perceived to be poor 

[0003] US 4,951 ,1 40 discloses a device for encoding 
Image daia including a face, where the face region is 
detected and more bits are allocated to the extracted 
face region than lo the rest of the image, to achieve a 
better quality image of the face region, 
[0004] According to a first aspect, the invention pro- 
vides a method of transmitting a video image including 
an object of interest comprising capturing a sequence 
of images in which the object of interest occupies afrac- 
lion of each image, tracking the object of interest by se- 
lecting and extracting a region of each image including 
the object of interest, and coding only the selected re- 
gion of each captured image. 

[0005] By an-anging that the object of interest occu- 
pies a small fraction of the image, the probability that 
the object of interest stays within the frame of the c^- 
tured image is increased. At the same time the object of 
interest occupies a relatively high fraction of the region 
that is coded. As a result of the invention, the amount of 
data to be coded is reduced. Even if the object of interest 
moves around within theframe of the captured image, 
the extracted regidn.fbllows the object, so that the object 
moves less within thie extracted region. Preferably, the 
object of interest is centred in the extracted region, so 
that the object is stable within the extracted regions. 
[0006] Preferably, the extracted region is displayed at 
a resolution lower than the resolution of the captured 
image. Because the object of interest occupies a rela- 
tively large proportion of the terminal display, the user 
perceives that the quality of the image is improved. 
[0007] According to a second aspect, the invention 
provides a method of transmitting a video image includ- 
ma an object of interest comprising selecting a region of 
;t lage including the object of interest, the selected 



region being of a predetermined size, and coding the 
selected region, 

[0008] As above, the object of interest occupies a rel- 
atively high fraction of the region that is coded. 

5 [0009] Preferably, only the selected region is coded 
and the rest of the captured image is discarded. As a 
result, the amount of infomnalion to be coded and Irans- 
mitted Is reduced. Preferably, the selected region corre- 
sponds to a predetennined image fomnat having fewer 

10 pixels than the capture image of the camera. Conse- 
quently, theselected region can be coded and displayed 
using known coders and displayed for the known image 
format without furtherprocessing to adaptthe extracted 
region to the specific fomnat. This reduces the amount 

15 of processing required. 

[0010] According to a third aspect, the invention pro- 
vides a method of transmitting a video image including 
an object of interest comprising selecting a region of the 
image greater than the object of interest by a predeter- 

20 mined degree; and coding said region, 

[0011] The third aspecl ol ihc invention offers advan- 
tages similar to \hose ot the first and the second aspocl. 
Also, more specifically, the si/c of the selected region 
varies with the size of the object within the image (which 

25 changes for example as the object moves relatively to- 
wards or away from the camera) so that the ratio of ob- 
ject data to background data stays approximately the 
same. 

[001 2] According to a fourth aspect, the Invention pro- 

30 vidss a method of operating a video camera comprising 
arranging the camera so that an object of interest occu- 
pies a fraction of the area of the captured image, track- 
ing movement of the object ot interest within the cap- 
tured image, selecting and extracting a region of interest 

35 around the object of interest and displaying only the ex- 
tracted part of the captured image. 
[0013] As a result of the invention, it is easier to keep 
the face of the user within the image captured by the 
camera while also maintaining a high quality displayed 

40 image at the receiving terminal. Also, the amount of in- 
fonmation to be coded may be reduced compared with 
the prior art. Also, effects of the invention can be ob- 
tained using certain standard components such as 
standard encoders. 

45 [0014] Embodiments of the invention will be de- 
scribed with reference to the accompanying drawings of 
which: 

Fig. 1 is a block diagram of a mobile video commu- 
50 nication system; 

Fig. 2 is a block diagram showing the image 
processing circuit of Fig. 1 in more detail; 

55 Fig, 3 is a block diagram of an image processing 
circuit in a second embodiment; 

Fig, 4 is a block diagram of an image processing 
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circuit in a tliird embodiment. 

[001 5] An example of an application of tfie present in- 
vention is a mobile video plione communication system. 
Components of such a system are shown In block dia- s 
gram form in Fig. 1. 

[0016] A mobile phone (not shown) includes a camera 
2 for capturing imeges of the user. The camera 2 is a 
known type of camera for use in mobile video phones 
and Is part of the phone handset. In an alternative em- 'o 
bodiment, the camera is a separate component con- 
nected to the phone handset, for example, by a lead or 
by wireless communication. The camera digitises Imag- 
es at CIF resolution (352x288 pixels). The optical sys- 
tem of the camera Is chosen so that In use the face of is 
the user occupies approximately a predstemnined frac- 
tion of the target Image resolution, which is the resolu- 
tion of the display 14. Here, the resolution of the display 
corresponds to QCIF format (176x144 pixels). In this 
embodiment, the optical system is configured so that in . 20 
normal use the face occupies approximately 80% of the 
target resolution. Of course, the actual fraction of the 
image occupied by the face of the user will in use de- 
pend on various factors, such as the size of the face of 
the user and where the camera is actually held. Accord- 25 
ingly. the configuration of the camera including the focal 
length of the optical system is detemiined on the basis 
of statistical information representing, amongst other 
things, ihc average size of people's faces and what is 
considered a comfortable distance from the face for 30 
holding the camera. 

[0017] The camera is connected to a signal processor 
4 for processing signals received from the cam era 2 rep- 
resenting the captured image. The signal processor 4 is 
shown in more detail in Fig. 2. The signal processor in- ss 
eludes aface detection module 16, fordetectingthe size 
and position of the face or head in the captured image, 
a face tracl<ing module 18, for tracl<ing the face as it 
moves in the image, a region selection circuit 20, for se- 
lecting a specific region of the image, and a face region 40 
extraction module 22. Face-detection circuits and face 
tracking circuits are known and described, for example, 
in G. Burel and D, Carei - Detection and Localisation of 
faces on digital images. Pattern Recognition Letters, 15: 
963-967, October 1994 and in Lars-Peter Bala, Kay Tal- 45 
mi and Jin Liu - Automatic Detection and Tracking of 
Faces and Facial Features in Video Sequences, Picture 
Coding Symposium 1 997, 10-12 September 1 997, Ber- 
lin Gernrjany, the contents of which are incorporated by 
reference.Thesignalprocessor4operatestoselectand 50 
extract a desired region of the image including the face 
region, as will be described in more detail below, 
[0018] An output of the signal processor 4 is connect- 
ed to an encoder 6, for encoding signals representing 
the extracted region of the image signal. The encoder 6 55 
is a known encoder. The encoder is connected to a 
transmitter 8, for transmitting the coded signal in a 
known manner. 



[0019] The receiving side of the system is a receiving 
terminal in the form of a second mobile phone (not 
shown). The second phone includes a known receiver 
10 for receiving the transmitted signai, a decoder 12 
connected to the receiver for decoding the received sig- 
nal, and a display 14 for dispSaying the received image 
in QCIF forniat. 

[0020] In operation, an image is captured by the cam- 
era 2, and the resulting signals are input to the signal 
processor 4, The image is analysed by the face-detec- 
tion module 1 6, which determines the position and size 
of the face within the image in a known manner. 
[0021] Infomiation regarding the location and size of 
the face are input from the face-detection module 1 6 to 
the region selection circuit 20, which detennines the size 
and location of the window to be selected from the main 
image. In this embodiment, the region selection circuit 
20 is configured to select a window of a predetermined 
size centred on the face. More specifically, the region 
selection circuit selects a window having the same res- 
olution as the display. Thus, in this case, the region se- 
lection circuit is configured to select a region sized 
1 76x1 44 pixels, centred on the face region, The centre 
can be defined and determined in any suitable manner, 
in this embodiment, the centre of the face is the mid- 
point based on Ihc extremes vertically and horizontally 
of the flesh-region. 

[0022] Because of the set up of the optical system of 
the camera 2, as explained above, Ihc lace region oc- 
cupies approximaicly 80% ol the selected window {in 
normal use), that is approximately 100x150 pixels. 
Thus, in nonnal operation, assuming the face is in the 
centre of the GIF image, there is a boundary around the 
selected region of 1 26 pixels in the vertical direction and 
69 pixels in the horizontal direction. Thus, even if the 
outline cf the face is displaced horizontally or vertically 
because of head and/or camera movements, it will still 
be reflected within the CIF image as long as displace- 
ment is less than the distances mentioned above. For 
the above example, the vertical face displacement in the 
image plane can be 1,26 times the face width. To 
achieve a similar coverage In a conventional system 
with QCIF resolution, the width of a face would have to 
be 50 pixels. 

[0023] The face region extraction module 22 receives 
signals from the camera and from the region selection 
circuit and extracts the window inciuding the face region 
from the image from the camera. The extracted window 
is then transferred to the standard QCIF coder 5 for cod- 
ing using a suitable known coding method. The remain- 
der of the image is discarded. These steps are per- 
formed for each frame of the captured video images, the 
face being tracked by the tracking circuit to reduce the 
amount of processing required. 

[0024] As the face-region moves within the captured 
image, the extracted window also moves around the 
captured image. Because the face detection module is 
supported by the face tracking module, it is not neces- 
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sary to do a full face-detection process in each frame 
and thus the amount of processing is reduced. Even 
more Importantly, since tfie tracking module usually us- 
es low-level image features fortracl<ing, such as local 
image intensity variations or edges, the overall system 
can achieve good results in stabiiising the position of the 
face (or other objects). It is preferable not to solely rely 
on object detection (eg pattern-recognition techniques) 
for tracking and stabilisation ot the moving object (say 
face), because It gives unacceptabty poor results. Be- 
cause the region selection circuit 20 is configured to se- 
lect the window centred on the face region, and it usgs 
accurate movement information from the tracl<ing re- 
gion; the face Is reliably stabilised within the extracted 
window. Thus, even if the head moves within the cap- 
tured image, it does not move within the extracted win- 
dow. This Is less distracting for the viewer 
[0025] The coded signal is transmitted, received and 
decoded by the receiving terminal, which displays the 
image of the face in QCIF format. Because of the proc- 
ess of selection of a region of the captured image which 
has the face in the middle of the region and which is of 
QCIF resolution, the displayed image has Ihe face in the 
middle and is the correct resolution for the display. Also, 
the face is displayed as a higher fraction of the image 
than in the captured image, which gives the impression 
of better resolution. 

[0026] A second embodiment of the invention will be 
described with reference to Fig. 3. The second embod- 
iment corresponds to the first embodiment but has a re- 
gion extraction and scale module 24 in place of the re- 
gion extraction module 22. 

[0027] In embodiment 2, a region surrounding aface- 
region is Selected as in the first embodiment. However, 
the extracted region is also scaled to compensate for 
variations in the size of the face region resulting from 
movements of the user relatively towards and/or away 
from the camera. In other words, the extraction and 
scale module 24alao performs a digitalzoom procedure 
on the extracted region. Scaling is performed so that the 
fraction of the extracted region occupied by the face re- 
gion is approximately the same in each successive ex- 
tracted region. Scaling is based on the information from 
the tracking module. In this embodiment the tracking 
■module is generalised so that it also provides additional 
information about the scale of the object. Coding, trans- 
mission and display are carried out as in the first em- 
bodiment. Ttie second embodiment has the advantage 
that it results in a more stable image that is less distract- 
ing to the viewer. 

[0028] A third embodiment of the invention will be de- 
scribed with reference to Fig. 4. 
[0029] The third embodiment con-esponds to the first 
or second embodiments, subject to modifications to the 
region selection circuit and the region extraction mod- 
ule. Also, the region selection circuit 20' has a user input. 
[0030] In this embodiment, the region selection circuit 
20' operates to select a region around the face region 



such thattheface region occupies apredeterminedfrac- 
tion of the selected region. The predetermined fraction 
is selected by the user by way of the user Input, in the 
form of a keyboard. In an alternative embodiment, the 

5 fraction may be fixed by the manufacturer In this exam- 
pie, it has been selected that the face region occupies 
80% of the selected region. In other words, the size of 
the selected region is 125% of the face region. 
[0031] The face detection and tracking modules 16, 

10 1 8 detect and track the face region as in embodiments 
1 and 2. The region selection circuit 20' then selects a 
region around the face region in accordance with the 
preferences. Here, the region is a rectangular region, 
scaled in relation to the face region, and centred on the 

15 face. 

[0032] The selected region is then extracted by the 
region extrac":icn modulG 22'. The size of the extracted 
region in pixels is aepcndent on the size of the face in 
the capturca Image, arid it may vary, for example, as the 

20 head moves closer to or further away from the camera, 
Thus, the selected region is scaled to a predetermined 
size in the region extraction module. Here, the region is 
scaled tc QCiF forma;, so-hat it can then becoded using 
a standard QCIF encoder 6. Allcrnatively, the captured 

25 image can be subjected to digital zoom before the face 
region is extracted so that the size of the face, and hence 
the size of the extracted region, is the same in each 
frame. 

[0033] Subsequently, the coded signal is transmitted 

so and displayed as described above. 

[0034] The above embodiments have been described 
in relation to mobile video phone'communlcatlon. The 
Invention may also be used In other applications, such 
as in video-conferencing and transmission of video Im- 

35 ages from cameras connected to personal computers. 
The embodiments describe selection of a region includ- 
ing the face of a speaker as an object of interest, but the 
invention can be applied in relation to any other object 
of interest. The Invention has been described using GIF 

40 and QCIF, but other fomnats may be used. References 
to certain specific formats include modifications, such 
as rotations, of those formats. For example; QCIF for- 
mat has a greater width than height (similar to "land- 
scape" configuration In relation to printing on paper). In 

45 the context of a person's face, QCIF format rotated by 
90°, that Is, so it has greater heightthan width (like "por- 
trait" configuration), is preferable, so that the face occu- 
pies a larger proportion of the selected region and less 
space is allocated to less important parts of the selected 

50 region. Similar considerations apply to the choice of the 
selected and/or displayed region, with reference to the 
nature, especially shape, of the object of interest, even 
if the selected and/or displayed region is not in accord- 
ance with specific known formats. In embodiment 3, in- 

55 stead of selecting a region that is a certain percentage 
greater than the face-region, the selected region could 
be a predetermined amount greater, for example,- longer 
and wider by a certain number of pixels. 
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Claims 

1. A method of transmitting a video including an object 
of interest comprising capturing a sequence of im- 
ages in wtiich the object of interest occupies afrac- 
tion of each image, tracl^ing movement of the object 
of interest and selecting and extracting a region of 
each irhage including the object of interest, and cod- 
ing only the selected region of each captured im- 
age. 

2. A method as claimed in claim 1 comprising stabilis- 
ing the object of interest within the extracted region. 

3. A metiod as claimed in claim 2 wherein the extract- 
ed region is selected so that the object of interest Is 
centred within the extracted region, 

4. A method as claimed In any one of claims 1 to 3 
comprising transmitting the coded region, and de- 
coding and displaying the selected region. 

5. A method as claimed In claim 4whereln the extract- 
ed region is displayed In a format comprising fewer 
pixels than the format of the captured image. 

6. A method as claimed in any one of claims 1 to 5 in 
which the object of interest occupies less than a pre- 
determined fraction of each image. 

7. A method as claimed in any one of claims 1 to 5 in 
which the object of interest occupies a small fraction 
of each image. 

8. A method of processing a video including an abject 
of interest in a sequence of images comprising se- 
lecting a region of an Image including the object of 
interest, the selected region being of a predeter- 
mined size, and coding the selected region. 

9. A method as claimed in claim 8 wherein only the 
selected region Is coded and the rest of the cap- 
tured image Is discarded. 

1 0. A method as claimed in any p receding claim where- 
in the selected region corresponds to a predeter- 
mined image fomnat having fewer pixels than the 
fonnat of the Image capture of the camera. 

11. A method as claimed in claim 10 wherein the cap- 
tured image is In GIF format and the selected region 
Is In QCIF format. 

12. A method as claimed in any preceding claim where- 
in the selected region is scaled to compensate for 
movements of the object of Interest backwards and 
forwards relative to the camera. 



1 3. A method as claimed In any of claims 8 to 12 where- 
in the object of interest is stabilised within the se- 
lected region. 

5 14. A method as claimed in claim 13 wherein the se- 
lected region is such so that the object of interest is 
centred in the seiecled region. 

15. A method of processing a video including an object 
io of Interest in a sequence of images comprising se- 
lecting a region of an image Including the object of 
interest and which is greater than the area occupied 
by the object of interest by a predetemnlned degree, 
and coding said region. 

15 

1 6. A method as claimed in any preceding claim where- 
in the object of interest occupies a predetermined 
percentage of the selected region. 

20 17. A method as claimed in any preceding claim com- 
prising scaling the selected region to a predeter- 
mined size. 

18. A method as claimed in claim 17 wherein the pre- 
ss detenmined size corresponds to a known format. 

19. A method as claimed in claim 18 wherein the cap- 
tured Image is in GIF format and the extracted re- 
gion is scaled to QCIF format. 

30 

20. A method of transmitting video images comprising 
processing video images according to a method as 
claimed in any one of claims 1 to 19, transmitting 
the encoded image data, and receiving, decoding 

35 and displaying the image data. 

21. A method of operating a video camera comprising 
arranging the camera so that an object of interest 
occupies a fraction of the area of the captured im- 

40 age, tracking movement of the object of interest 
within the captured image, seiecting and extracting 
a region of interest around the object of interest and 
displaying only the extracted part of the captured 
image. 

45 

22. A method as claimed in any preceding claim com- 
prising compensating for changes in size of the ob- 
ject in the sequence of Images. 

50 23. An Image processing circuit comprising means for 
extracting a region of each Image in a sequence of 
images including an object of Interest and coding 
only the selected region of each captured image. 

55 24. An image processing circuit comprising means for 
seiecting a region of an image including an object 
of interest, the selected region being of a predeter- 
mined size, and coding the selected region. 
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25. An image processing circuit comprising nneans for 
selecting a region of the image such tliat the object 
of interest occupies a predetermined percentage of 
the region, and for coding said region. 

5 

26. A circuit as claimed in any one of claims 23 to 25 
comprising means for tracking movement of the ob- 
ject of interest in a sequence of images. 

27. A circuit as ciaimed in any one of claims 23 to 26 io 
comprising means for compensating for changes of 
size of the object of interest in the sequence of im- 
ages, 

28. A video image processing drcuit comprising means 1S 
for performing a method as claimed in any one of 
claims 1 to 22, 

29. A video image processing device comprising a cam- 
era and a circuit as claimed in any one of claims 23 
to 28. 

30. A mobile phone comprising a circuit as ciaimed in 
any one of claims 23 to 28 or a device as claimed 
in claim 29. 
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